The GROUP BY clause has been part of the SELECT statement syntax since the earliest dialects of T-SQL. You use GROUP BY
to create queries that collapse multiple rows belonging to the same
group into a single summary row and perform aggregate calculations (such
as SUM and AVG) across the individual rows of each group. SQL Server 6.5 later extended the GROUP BY clause by adding the WITH CUBE and WITH ROLLUP
operators. These operators perform additional grouping and aggregation
of data in standard relational queries, similar to what is provided by
online analytical processing (OLAP) queries that slice and dice your
data into Analysis Services cubes, but without ever leaving the
relational database world. SQL Server 2008 added the GROUPING SETS operator that further extends the capabilities of the GROUP BY clause for summarizing and analyzing your data.
In this section, you will examine GROUP BY in many of its variant forms. You’ll start with the basic GROUP BY clause, and then you’ll learn how the WITH CUBE and WITH ROLLUP operators can be used to enhance those summary results. Then you’ll explore the GROUPING SETS operator added in SQL Server 2008.
Start
with a simple inventory table that contains quantities for various
items in diverse colors that are available at different store locations,
as shown in Example 1.
Example 1. Creating the Inventory table.
CREATE TABLE Inventory(
Store varchar(2),
Item varchar(20),
Color varchar(10),
Quantity decimal)
Next, add some inventory data.
There are 13 rows that contain inventory for tables, chairs, and sofas
available in blue, red, and green, at NY, NJ, and PA locations, as shown
in Example 2.
Example 2. Populating the Inventory table.
INSERT INTO Inventory VALUES('NY', 'Table', 'Blue', 124)
INSERT INTO Inventory VALUES('NJ', 'Table', 'Blue', 100)
INSERT INTO Inventory VALUES('NY', 'Table', 'Red', 29)
INSERT INTO Inventory VALUES('NJ', 'Table', 'Red', 56)
INSERT INTO Inventory VALUES('PA', 'Table', 'Red', 138)
INSERT INTO Inventory VALUES('NY', 'Table', 'Green', 229)
INSERT INTO Inventory VALUES('PA', 'Table', 'Green', 304)
INSERT INTO Inventory VALUES('NY', 'Chair', 'Blue', 101)
INSERT INTO Inventory VALUES('NJ', 'Chair', 'Blue', 22)
INSERT INTO Inventory VALUES('NY', 'Chair', 'Red', 21)
INSERT INTO Inventory VALUES('NJ', 'Chair', 'Red', 10)
INSERT INTO Inventory VALUES('PA', 'Chair', 'Red', 136)
INSERT INTO Inventory VALUES('NJ', 'Sofa', 'Green', 2)
Now use a basic GROUP BY clause to query on this data:
SELECT Item, Color, SUM(Quantity) AS TotalQty, COUNT(Store) AS Stores
FROM Inventory
GROUP BY
Item, Color
ORDER BY Item, Color
As implied by its syntax, this query
groups all the inventory records by item and then by color within each
item. The result set therefore includes one summary row for each unique
combination of items and colors. The store location is not included in
the grouping, and so the results returned by the query apply to all stores. Each summary row includes a TotalQty column calculated by the SUM aggregate function as the total quantity for all rows of the same item and color across all stores. The last column, Stores, is calculated by the COUNT
aggregate function as the number of store locations at which each
unique combination of items and colors is available, as shown here:
Item Color TotalQty Stores
-------------------- ---------- ------------------------------ ------
Chair Blue 123 2
Chair Red 167 3
Sofa Green 2 1
Table Blue 224 2
Table Green 533 2
Table Red 223 3
(6 row(s) affected)
These results show that SQL
Server grouped the inventory records sharing the same item and color
into a single summary row. The store location is not included in the
breakdown, because you did not group by it, and so each summary row
applies to all stores. For each item, the total quantity is calculated
as the sum of the individual quantity values for the item and color
combinations in each group, and the store count is calculated as the
number of store locations at which each item and color combination is
available. With GROUP BY, every column returned by the query must be either one of the columns actually being grouped by (such as the Store, Item, and Color columns) or an aggregate function that operates across all the combined member rows for the group [such as the SUM(Quantity) and COUNT(Store) functions].
This query demonstrates the most basic application of the GROUP BY clause, which simply groups and aggregates. It answers the question “How many items per color are in each store location?” by grouping items and colors. The WITH ROLLUP and WITH CUBE operators (which were introduced in SQL Server 6.5) can be used to answer more questions than that. Each of these operators supplements the results of an ordinary GROUP BY
clause with additional summary aggregations on the underlying data.
Here is the same query you ran before, only this time using WITH ROLLUP:
SELECT Item, Color, SUM(Quantity) AS TotalQty, COUNT(Store) AS Stores
FROM Inventory
GROUP BY Item, Color WITH ROLLUP
ORDER BY Item, Color
GO
Item Color TotalQty Stores
-------------------- ---------- ------------------------------ ------
NULL NULL 1272 13
Chair NULL 290 5
Chair Blue 123 2
Chair Red 167 3
Sofa NULL 2 1
Sofa Green 2 1
Table NULL 980 7
Table Blue 224 2
Table Green 533 2
Table Red 223 3
(10 row(s) affected)
This time, you receive the same six grouped results as before, supplemented with four additional rollup rows (the ones with NULL values for Item or Color,
highlighted here in bold). Rollup rows contain additional higher-level
summary information that essentially “groups the groups” of the query
results. Any row with NULL values in it is a rollup row, and the NULL should be thought of as “all values” in this context.
In these results, the first row is the top-level rollup, as indicated by NULL values for both Item and Color.
This top-level rollup reports a grand total quantity of 1,272 for the
entire set (all items in all colors) in all store locations (where the
entire set consists of the 13 unique item/color combinations across all
locations).
The next result is an item-level rollup for chairs. It
reports a total quantity of 290 for chairs in all colors across 5 store
locations. The two results that follow are the same summary rows for
chairs returned by the first “plain” GROUP BY
query and that were just rolled up. They show 123 blue chairs in 2
locations and 167 red chairs in 3 locations. The next result is an
item-level rollup for sofas. Only one store location carries sofas, and
they’re available only in green. The sofa rollup therefore contains the
same values as the one and only summary row for 2 green sofas available
in 1 location. The last set of rows report on tables in the same way
that the chair and sofa data was returned. This includes an item-level
rollup showing 980 tables across 7 locations followed by the summary
rows showing 224 blue tables in 2 locations, 533 green tables in 2
locations, and 223 red tables in 3 locations returned.
So by simply adding WITH ROLLUP, you can answer a second question that the first ordinary GROUP BY query couldn’t: “How many chairs, tables, and sofas are in stock, regardless of color?”
2. Rolling Up All Level Combinations
Using WITH CUBE now instead of WITH ROLLUP takes this result set to the next level, as shown here:
SELECT Item, Color, SUM(Quantity) AS TotalQty, COUNT(Store) AS Stores
FROM Inventory
GROUP BY Item, Color WITH CUBE
ORDER BY Item, Color
GO
Item Color TotalQty Stores
-------------------- ---------- ------------------------------ ------
NULL NULL 1272 13
NULL Blue 347 4
NULL Green 535 3
NULL Red 390 6
Chair NULL 290 5
Chair Blue 123 2
Chair Red 167 3
Sofa NULL 2 1
Sofa Green 2 1
Table NULL 980 7
Table Blue 224 2
Table Green 533 2
Table Red 223 3
(13 row(s) affected)
You now have the same result set returned by WITH ROLLUP, only this time three more rollup rows have been added (again, indicated in bold here). Let’s look at exactly what SQL Server did. By applying WITH CUBE,
you instructed the database engine to create a multidimensional
representation of the data on the fly, which is loosely referred to as a
cube. The number of dimensions in the cube is based on the number of grouping
columns. This inventory example has only two dimensions, but a query
could have many more dimensions if it specifies more grouping columns. A
cube contains rollups for all the possible permutations of dimension
values, not just the combinations of one value within another, as per the nesting levels defined by grouping columns listed in the GROUP BY clause.
So WITH CUBE returns the same rollups returned by WITH ROLLUP—which
includes all items regardless of color—plus additional rollups for all
colors regardless of item. As a result, you can now answer a third
question that the earlier GROUP BY queries couldn’t: “How many items of any
type in a particular color are in stock?” That means that you can now
also see how many blue, green, or red items you have in inventory
regardless of whether they are chairs, sofas, or tables.
Because a cube rolls up every possible combination of dimension values independent of the order of levels expressed with GROUP BY, each additional grouping
level increases the size of the result set exponentially. For example,
if you modify the query to group by store location as well, SQL Server returns 44 rows including rollups for every possible combination of values across the three grouping columns Store, Item, and Color, as follows:
SELECT Store, Item, Color, SUM(Quantity) AS TotalQty
FROM Inventory
GROUP BY Store, Item, Color WITH CUBE
ORDER BY Store, Item, Color
GO
Store Item Color TotalQty
----- -------------------- ---------- ------------------------------
NULL NULL NULL 1272
NULL NULL Blue 347
NULL NULL Green 535
NULL NULL Red 390
NULL Chair NULL 290
NULL Chair Blue 123
NULL Chair Red 167
NULL Sofa NULL 2
NULL Sofa Green 2
NULL Table NULL 980
NULL Table Blue 224
NULL Table Green 533
NULL Table Red 223
NJ NULL NULL 190
NJ NULL Blue 122
NJ NULL Green 2
NJ NULL Red 66
NJ Chair NULL 32
NJ Chair Blue 22
NJ Chair Red 10
NJ Sofa NULL 2
NJ Sofa Green 2
NJ Table NULL 156
NJ Table Blue 100
NJ Table Red 56
NY NULL NULL 504
NY NULL Blue 225
NY NULL Green 229
NY NULL Red 50
NY Chair NULL 122
NY Chair Blue 101
NY Chair Red 21
NY Table NULL 382
NY Table Blue 124
NY Table Green 229
NY Table Red 29
PA NULL NULL 578
PA NULL Green 304
PA NULL Red 274
PA Chair NULL 136
PA Chair Red 136
PA Table NULL 442
PA Table Green 304
PA Table Red 138
(44 row(s) affected)
These results can now answer inventory
questions for every conceivable combination of grouping levels. For
example, across all locations, there are 347 blue items (tables, chairs,
and sofas), 290 chairs (all colors), and 533 green tables, whereas in
NY specifically, there are 50 red items, 382 tables, and 124 blue
tables, and so on. Every permutation of store location, item, and
color—and their rollups—are returned by this single query.